Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA

Machine learning (ML) has demonstrated promise in predicting mortality; however, understanding spatial variation in risk factor contributions to mortality rate requires explainability. We applied explainable artificial intelligence (XAI) on a stack-ensemble machine learning model framework to explore and visualize the spatial distribution of the contributions of known risk factors to lung and bronchus cancer (LBC) mortality rates in the conterminous United States. We used five base-learners—generalized linear model (GLM), random forest (RF), Gradient boosting machine (GBM), extreme Gradient boosting machine (XGBoost), and Deep Neural Network (DNN) for developing stack-ensemble models. Then we applied several model-agnostic approaches to interpret and visualize the stack ensemble model's output in global and local scales (at the county level). The stack ensemble generally performs better than all the base learners and three spatial regression models. A permutation-based feature importance technique ranked smoking prevalence as the most important predictor, followed by poverty and elevation. However, the impact of these risk factors on LBC mortality rates varies spatially. This is the first study to use ensemble machine learning with explainable algorithms to explore and visualize the spatial heterogeneity of the relationships between LBC mortality and risk factors in the contiguous USA.

Lung and bronchus cancer (LBC) is one of the most common causes of cancer death globally, accounting for 11.6% of all cancer deaths in 2018 1 . It contributes substantially to healthcare costs and the health burden globally 2 and is an insistent public health concern due to its low survival rate 3 . In the USA, the LBC mortality rate declined by 48% from 1989 to 2016 3 , but it remains the top cause of cancer-related death 4 . An estimated 142,670 Americans were expected to die from lung cancer in 2019, approximately 23 percent of all cancer deaths 3 . LBC mortality rates vary substantially between and within states in the US 3,5 . This variation has been mainly linked to variation in smoking prevalence 6 . Yet, causes of lung cancer mortality are more complex 7 and are also linked with air pollution 8 , and socioeconomic conditions 3,9 . Some of these risk factors have not been previously included in the modeling of predicting the LBC mortality rate 7,8,[10][11][12][13] .
Several statistical methods and tools have been developed to analyze and report cancer incidence and mortality statistics in the USA, including the Poisson-gamma model, the multivariate conditional autoregressive model, and Bayesian inference 14 . The state-space method (SSM) 15 and autoregressive quadratic time trend model 16 are primarily used to estimate the total number of cancer deaths expected to occur in a given period. Numerous studies have applied Geographically weighted (GW) models to explore the geographic relationship between risk factors and the LBC mortality rate 7,8,[17][18][19] . However, a traditional linear model may fail to capture complex interactions and non-linear relationships between LBC mortality and risk factors. The increasing availability of data and machine learning (ML) models present an opportunity to predict and identify the factors contributing to the LBC mortality rate and help develop a strategy for targeting areas for the management of treatment. The machine learning approach has been recently applied to other health problems such as arrhythmia detection 20 ,

Material and methods
Data. Lung and bronchus cancer (LBC) mortality rates by county. The county-level age-adjusted annual LBC mortality rates from 2013 to 2017 were obtained from the National Vital Statistics System at the National Center for Health Statistics of the Centers for Disease Control and Prevention 16,38,39 . The detailed extraction and age adjustment methods of mortality data are described elsewhere 40 . Due to data suppression for reliability and confidentiality, missing LBC mortality rate data in 348 counties in the contiguous USA counties were imputed with missForest package 41 in R. The out-of-bag (OOB) imputation error (MSE) estimate was 35 per 100,000. Finally, we created a data-frame of 3107 counties in the conterminous US. We did not include Shannon county in South Dakota due to a miss-match between the new and old FIPS codes, unique county identification numbers (Fig. 2).

Risk-factors.
We assembled a comprehensive set of county-level risk factors (Table S2) to develop models to predict county-level LBC mortality rate in the contiguous USA. These data include variables relating to lifestyles, socio-economy, demography, air pollution, and physical environments.
Cigarette smoking prevalence. Data on age-adjusted cigarette smoking prevalence by county from 2008 to 2012 was obtained from the Institute for Health Metrics and Evaluation 42 , which derived the data from the results of the Behavioral Risk Factor Surveillance System (BRFSS) by using a logistic, hierarchical, mixed-effects regression model with spatial and temporal smoothing 43 . The BRFSS is a state-based random digit dial (RDD) telephone survey conducted annually in all states, the District of Columbia, and US territories. For the year 2008 to 2012 estimation, the root means squared error for male and female cigarette smoking was 1.9 for 100 sample size 43 . Data from 2013 to 2017 were obtained from County Health Ranking 44 , who also used BRFSS survey data to estimate county averages of age-adjusted cigarette smoking (%) prevalence. Before 2016, up to seven survey years of BRFSS data were aggregated to produce county estimates. The 2016 and 2017 data were obtained from single-year 2014 and 2015 BRFSS survey data, respectively. The average (2008-2017) smoking prevalence by county is shown in Fig. S1a.
Poverty rate. The data on the average (2012-2016) annual age-adjusted poverty data (% population below poverty level) by county are shown in Fig. S1b. The data were obtained from the US Census Bureau's Small Area Income and Poverty Estimates (SAIPE) program (US Cenus 45 . The county level observations from the American Community Survey (ACS) and census data were used to predict the number of people in poverty 46 . The ACS is an ongoing survey program conducted by the Census Bureau to provide vital population and housing information across the country 47 . Uninsured percentage. Data on the portion of the population under age 65 without health insurance coverage from 2013 to 2017 (Fig. S1c) was obtained from Small Area Health Insurance Estimates (SAHIE) program 45 . The SAHIE program produces model-based health insurance coverage estimates for demographic groups within counties and states 48 . Demography. County-level demography data such as white, non-Hispanic population (%,), black or African American, non-Hispanic population (%), Hispanic/Latino population (%), and population aged 65 and older (%) were obtained from the US Census 49 . We used the 5-year means (2013-2017) of these data in our study ( Fig. S2a-d).
Air pollution. Particulate matter (PM 2.5 ). The county-level annual PM 2.5 data were derived from the daily PM 2.5 data set downloaded from the CDC data portal 50 . This county-level of 24-h average PM 2.5 concentrations was generated by the US Environmental Protection Agency (EPA) using a Bayesian spatial downscaling fusion model 51 . For each county, annual PM 2.5 from 2006 to 2016 was averaged to yield long-term yearly averages, which are mapped in Fig. S3a.
Nitrogen dioxide (NO 2 ). Population-weighted NO 2 concentrations at 0.1° × 0.1° resolution were estimated using imagery from three satellite instruments, including the Global Ozone Monitoring Experiment (GOME), Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2 satellite in combination with the GEOS-Chem chemical transport model 52 . We resampled all raster data at a 2.5 km × 2.5 km grid size using Empirical Bayesian Kriging. We then averaged the results within each county for each year to yield a long-term annual average of NO 2 that was mapped from 2003 to 2012 (Fig. S3b).
Sulfur dioxide (SO 2 ). Gridded (1-degree spatial resolution) annual, mean SO 2 vertical column densities were obtained from time-series, multi-source SO 2 emission retrievals, and satellite SO 2 measurements from the Ozone Monitoring Instrument (OMI) on NASA's Aura satellite 53 . We resampled all raster data at a 2.5 km × 2.5 km grid size using Empirical Bayesian Kriging and then averaged the results within each county for the period from 2005 to 2015 (Fig. S3c).
Ozone. Annual county-level ozone data were derived from the Daily County-Level Ozone Concentrations downloaded from the CDC's data portal (CDC 54 , 2020). The daily data provide modeled predictions of ozone levels from the EPA's Downscaler model. The long-term average ozone concentration was generated from annual ozone data from 2006 to 2016 and mapped from 2007 to 2016 (Fig. S3d).
Urban-rural counties. The data on the division of counties into urban or rural was drawn from the National Center of Health Statistics (NCHS) data system's Urban-Rural Classification Scheme for Counties 56 . All counties were classified into six classes based on the metropolitan statistical areas (MSA) 57 . We then reclassified the counties into four major classes: large central metro, large fringe metro, medium/small metro, and nonmetro ( Fig. S4b).
Coal counties. We classified the counties into two classes (yes = coal produced, no = no coal production) according to the average coal production from 2006 to 2016 (Fig. S4c). We used data from the US Energy Information Administration and the US Mine Safety and Health Administration's annual survey of coal production by US coal mining companies 58 . Data includes coal production, company and mine information, operation type, union status, labor hours, and employee numbers.
Elevation. We used elevation data from USGS 59 . Median elevation (m) for each county (Fig. S4d) was calculated.
Analytical methods. We developed stacked ensemble models from the output of five ML models to predict and explain the county-level LBC mortality using many risk factors (Fig. 1). We applied a series of model-agnostic interpretation methods to investigate the effects of several well-known risk factors on LBC mortality rates in the US. Three spatial regression models were used to evaluate the performance of the stack-ensemble model.
Exploratory data analysis. Before developing the machine learning model, we explored spatial autocorrelation and stratified spatial heterogeneity (SSH) of LBC mortality rates. Spatial autocorrelation assessment comprises statistics describing how a variable is autocorrelated through geographical space 60 . We used Getis-Ord Gi statistics 61 to quantify spatial autocorrelation of LBC mortality rates by estimating the z-scores and p-values in each county. Larger statistically significant positive and negative z-scores indicate more intense clustering of high low values, respectively. We used ArcGIS Spatial Statistics Tools 62 to estimate Getis-Ord Gi statistics for spatial autocorrelation. We also estimated bivariate Local Moran I (LMI) statistics to explore the degree of linear www.nature.com/scientificreports/ association between LBC mortality rates and risk factors at a given location and the average of another variable at neighboring areas (spatial lag). Since our study area is vast, there is a possibility of high stratified spatial heterogeneity (SSH) which refers to a partition of a study area, where variables are homogeneous within each stratum but not between strata 63 . The q-statistic proposed by Wang et al. 63 measures the degree of SSH in geographical space related to the ratio between the variance of a variable within the strata and the pooled variance of an entire study area. The value of the q-statistic range from 0 to 1, and it increases as the strength of the SSH increases. The calculated q-statistics for all risk factors used the "factor_detector" function of "geodetector" package 63 in the R statistical computing environment 64 .
Training. Before training, the data set (n = 3,107 counties) was randomly split using stratified random sampling 65 into sub-sets of training (70%), validation (15%), and test data (15%). We used seven Gi-bins or clusters derived from Getis-Ord Gi* statistic of LBC mortality rates (Fig. 2a) as strata. The validation data was used to optimize the ML model parameters during the tuning and training processes. The test data set was used as the hold-out data to evaluate the model performance. The summary statistics and distribution of LBC mortality rate and risk factors in the training, validation, and test data sets are similar to those in the entire data set (Fig. S5a, b and Table S2).
Spatial regression models. The performance machine learning models were compared with three spatial regression models: spatial error, spatial Lag, and geographically weighted OLS (GW-OLS). A brief description of these models is given in supplementary information. For spatial regression analyses, "GWModel" 66 and "spatialreg" 67 packages in the R statistical computing environment were used 64 .  www.nature.com/scientificreports/ Machine learning base models. We trained the data with a generalized linear model (GLM), random forest (RF), Gradient boosting machine (GBM), extreme Gradient boosting machine (XGBoost), and Deep Neural Network (DNN) with several combinations of hyper-parameters. A brief description of all base learners is given in Supplementary information. During training, we used a Random Grid Search (RGS) to find the optimal parameter values for the base-learners to reduce over-fitting and enhance the prediction performance of the models 68 . The optimal hyperparameters were selected by conducting a grid-search using tenfold cross-validation (Supplementary Information Table S3). We used 0.001 and 2 for "stopping tolerance" and "stopping rounds" as early stopping parameters in the parameter tuning process. The best-performing model from each algorithm was selected according to their performance during tenfold cross-validation with different-parameters combinations. The root mean squared error (RMSE) was used as a performance matrix.
Stack-ensemble models. Ensemble machine learning with stack-generalization uses a higher-level model (meta-learner) to combine several lower-level models model as base-learners for better predict performance. Unlike the "bagging" in the random forest or "boosting" in Gradient boosting approaches that can only combine the same type of algorithm, stacked generalization can combine different algorithms to maximize the generalization accuracy. It uses the following three steps: (1) set up a list of base-learners (level-0 space) and a metalearner (level-1 space), (2) train each of the base-learners and perform K-fold cross-validation predictions for each base-learner, and (3) use these predicted values to train the meta-learner and make new predictions. The base-level models often consist of different learning algorithms, and therefore stacking ensembles often combine heterogeneous algorithms. The K-fold cross-validation outputs of all base learners were then trained with two stacked ensemble models at the end. One ensemble contains all the sub-models of five learners (n = 147), and the other includes just the best-performing model from each learner. The GLM regression model was used as a meta-learner at level 1-space. We used the "h2o" package 69 in the R statistical computing environment 64 to train, validate, and predict the GLM, RF, GBM, XGBoost, DNN, and stack-ensemble models.
Model performance. The performance of all base-learners and stack-ensemble models were evaluated with a hold-out-test data set. The diagnostic measures of prediction performance used here were the mean absolute error (MAE) (1), and the root mean square error (RMSE) (2). Also, we used observe versus predicted plots to visualize model performance and used simple linear regression between observed and predicted LBC-rates to judge model performance.
where n is the number of counties, and y and ŷ are observed and predicted LBC rates in county i.
We also calculated bias and variance of all spatial regression and ML models by resampling the training data set, repeating the model-building process, and deriving the average prediction error from the test data set. Bias represents how far away an average model prediction f (x) is far from the true f (x) , so, bias can be expressed as: The variance represents how much a model prediction changes with different training data, i.e., variation in prediction due to random sampling: So total expected error of a model prediction is composed of bias and variance: Explainable AI. The Permutation Feature Importance (PFI) approach 70 and Partial dependence plots (PDP) 71 are primarily used to explain and visualize the output from simple machine learning models. Unlike traditional statistical methods, the output of the stacked ensemble model is difficult to interpret since it combines different ML algorithms 72 . Therefore, we created several agnostic "model explainers " to interpret the stack ensemble model's output at local and global scales. The "explainers" make a unified representation of a model for further analysis 37 .
Permutation-based feature importance. We adopted the "model agnostic" Permutation Feature Importance (PFI) approach 70 , which measures the increase in the prediction error (drop-out loss or RMSE) of the model www.nature.com/scientificreports/ after the feature values are permuted by breaking down the relationship between the feature and the true outcome. This probabilistic method automatically considers interaction effects for importance calculation 73 .
Partial dependence (PD), local-dependence (LD) and accumulated-local (AL) profiles. It is not easy to interpret complex machine learning algorithms by examining their coefficients. However, a partial dependence (PD) profile can interpret a machine learning model's output and visualize how the model's predictions are influenced by each predictor when all other predictors are being controlled. In these plots, the Y-axis value ( y ) is determined by the average of all possible model prediction values when the value of the objective predictor is at the value indicated on the X-axis. Partial dependence plots can produce inaccurate interpretations if the predictors are strongly correlated 37 . As an alternative to partial dependence profiles, a new visualization approach, "accumulated local effects plots," has been proposed, which is unbiased and does not require this unreliable extrapolation with correlated predictors 74 . As accumulated-local (AL) profiles are related to local-dependence profiles (LD) 37 , both were applied to summarize the influence of an explanatory variable on the stack-ensemble model's predictions in this study.
Model agnostic greedy explanations of model predictions (breakDown). We applied the break-down variable's contribution to visualize and describe how risk factors contribute to LBC mortality rates prediction locally (at the county level). The objective of this approach is to decompose model predictions into parts that can be attributed to particular variables 75 . The "Break-down Plots" proposed by Biecek and Burzykowski 37 presents "variable attributions," i.e., "the decomposition of the model's prediction into contributions that can be attributed to different explanatory variables." We used the "DALEX" package 76 in R Statistical Computing Environment 64 to create "explainers" for PFI, PD -, LD-and AL-profiles, and local variables' contribution in the best performing stack-ensemble model prediction.

Results
Exploratory data analysis. Figure 2a shows the spatial distribution of county-level, age-adjusted LBC rates, averaged over 5 years (2013-2017). There was a total of 146,193 LBC -related deaths in the US during this period. The South and Appalachian regions had mean LBC rates during the period 1998-2012 that were much higher than the national average of 65 death per 100,000. The highest mean mortality rates were observed in Union County in Florida, followed by several counties in the Appalachian region covering Kentucky, Tennessee, and West Virginia, respectively. Counties with the lowest LBC mortality rates were observed in Summit County, Utah.
The Getis-Ord Gi* hotspot analysis identifies statistically significant clusters of counties with a high mortality rate ("hot" clusters) in the South, mainly in the Mississippi basin and the southern Appalachian region (Fig. 2b). The "cold" clusters (or areas where the mortality rate was relatively low) occurred predominantly in the Midwest and the western part of the country. There were some other small cold clusters of counties in the northeastern coastal region.
The correlations between LBC mortality rate and risk factors are weak to moderate (Fig. S6). The correlations were positive for LBC mortality rate and smoking (r = 0.623, p < 0.001), PM 2.5 (r = 0.425, p < 0.001), SO 2 (r = 0.293, p < 0.001) and poverty (r = 0.394, p < 0.001), and negative for LBC mortality rate and percent Hispanic population (r = − 0.364, p < 0.001) and median elevation (r = − 0.443, p < 0.001). The mean LBC mortality was significantly lower in the large metro area than in other areas (Fig. S7a). For radon zones groups, mean LBC mortality rates were lower in radon zones-1 (Fig. S7b). For the last 10 years, counties producing coal showed significantly higher LBC mortality rates than other counties (Fig. S7c).
The bivariate global Moran's I show a positive association between LBC mortality rates and smoking, PM 25 , SO 2, and poverty activity and a negative association between the Hispanic population and median elevation (Fig. S8). The bivariate LMI cluster of LBC mortality rates and twelve risk factors are presented in maps in Fig. S9. The red color (High-High) in maps corresponds to significant clusters of high LBC mortality rates and high prevalence of risk factors. The light red color (High-Low) in maps resembles clusters of high LBC mortality rates and low prevalence of risk factors.
To see how the risk factors explained the spatial distribution LBC mortality rate in the conterminous USA, we calculated q-statistic (strength of SSH) of 15 risk factors which were sorted in the order: Smoking > SO 2 > PM 25 > Elevation > Ozone > Poverty > Hispanic population > NO 2 > Population-65 yr > Black population > White population > Uninsured > Radon zone > Coal (yes/no) > Urban-Rural (Table 1). The q-value of smoking prevalence indicates a moderate stratified heterogeneity effect on LBC mortality rates distribution. Fourteen out of 15 variables exhibit low SSH.
Base learners turning parameters. The optimum RF model had ntrees, max_depth, and sample_rate of 576, 30, and 06, respectively. The best GBM had ntrees = 500, col_sample_rate = 0.5, max_depth = 20, min_ rows = 1.0. The best XGBoost model was found to have hyper-parameters of ntree = 350, max depth = 3, min_ row = 50, col sample rate = 75%. The DNN model had three hidden layers. Each layer had 100 neurons with a "Tanh", activation function, with very low L1 regularization and L2 regularization values to add stability and reduce the risk of over-fitting. The optimum GLM model had alpha = 0 and lambda = 1.

Performance of base learners and stack ensemble model.
The MAE values varied from 6.06 to 7.00 per 100,000, which is lower than the minimum value of the observed LBC rate, 10.1 per 100,000. All five baselearners displayed only slight differences in their RMSE statistics. Among the base-learners, the RF and GBM models performed better than all the other learners during the training stage ( www.nature.com/scientificreports/ RMSE statistics and explained more than 95% of the variability in LBC mortality rates for the training data set. However, when the models were applied to the validation data set, they had relatively high MAE and RMSE statistics, indicating problems in generalizing their results beyond the training data set (i.e., generalization error). The performance of three spatial regression models, five base-learners and two stack-ensemble models, was further evaluated using a hold-out test data set (Table 2 and Fig. S9). The stack-ensemble model with all base learners (N = 147) improved prediction over the five base models (level-0 space) and three traditional spatial regression models. The improvement in the RMSE ranged between 2 and 32%. The R 2 for the predicted versus the observed values for the test data set was 0.61 ( Table 2). None of the base-learners successfully predicted the lowest and highest LBC rates for the hold-out test data, and they over-estimated low-values and under-estimated higher values (Fig. S10a-j).
When all models were rerun with ten randomly sampled trained data sets and validated with a test data set, we found the bias 2 of RF, GBM, and the stack-estimable with all base-learners were significantly lower than other models (Fig. S11a). However, the prediction variance of these models with different training data sets was high (Fig. S11b). The highest bias 2 and the lowest variance were obtained with the spatial lag, GLM, and spatial error models.
Permutation-based variable importance. The Feature Importance (the factor by which the RMSE is increased compared to the original model if a particular feature is permuted) of the best stack-ensemble model is shown in Fig. 3. Among the 15 risk factors, total smoking prevalence was identified as the most important variable, followed by poverty rate, elevation, percent white population, and PM 2.5 in the contiguous US. Figure 4 shows partial-dependence, local-dependence, and accumulated-local profile plots of six important risk factors. Partial dependence plots help us understand the marginal effect of a feature (or subset thereof) on the predicted outcome. PD profiles offer a simple way to summarize a particular risk factor's effect on the LBC mortality rate. When other predictors were controlled for, the effects of smoking prevalence (Fig. 4a), poverty (Fig. 4b), percentage white population (Fig. 4d), and PM 2.5 (Fig. 4f) showed a positive effect (blue lines) on predicted LBC mortality rates. However, elevation (Fig. 4c) and percentage Hispanic population (Fig. 4e) have a strong negative effect on expected LBC mortality rates.

Partial dependence (PD), local-dependence (LD), and accumulated-local (AL) profiles.
Accumulated-local profiles are helpful in summarize an explanatory variable's influence on the model's predictions when explanatory variables are correlated. When the model is additive but, explanatory variables are correlated, neither PD nor LD profiles will adequately capture the explanatory variable's effect on the model's predictions 37 . However, the AL profile will provide a correct summary of the impact of variables on prediction. The AL and PD profiles (blue-lines Fig. 5) parallel each other for all six risk factors, suggesting that the stackensemble model is additive for these six explanatory variables.
The contour plot in Fig. 5 shows the dependence of the LBC mortality rate on the joint values of two risk factors when the effects of other risk factors are being controlled. When the average smoking prevalence is lower than ~ 30%, LBC rates are nearly independent of poverty, whereas, for smoking prevalence rates greater than ~ 30%, a strong dependence on poverty was observed (Fig. 5a). Similar positive interactions between smoking and the percent white population (Fig. 5b) and smoking and PM 2.5 (Fig. 5d) were observed; since increases in these risk factors are associated with an increase in the LBC mortality rate. However, smoking prevalence and percent Hispanic population (Fig. 5c) and PM 2.5 and Elevation (Fig. 5e) interacted in opposite ways in prediction. to understand, and several risk factors' contributions can be presented in a limited space. The BD. plots can be used to show "variable attributions," i.e., the decomposition of the model's prediction into contributions that can be attributed to different explanatory variables 37 . We selected two counties, Summit, Utah, and Union County, Florida, to explore the contribution of risk factors in two contrasting environments because the lowest and highest LBC mortality rates were observed in these counties. The median elevation in Summit and Union Counties are 2,587 and 47 m, respectively, and the prevalence of smoking and poverty in Summit County is lower than in Union county. The red and green bars in Fig. 6 indicate negative and positive changes in the mean predictions attributed to the risk factors. The most considerable negative contributions to predicting the LBC mortality rate for Summit County, Utah, come from elevation, smoking, and poverty (Fig. 6a). The contributions of the remaining other risk factors are smaller (in absolute values). For Union County, Florida, the predicted LBC mortality rate is attributed to the positive contribution of smoking, poverty, PM 2.5 , and radon-zone (Fig. 6b). Figure 7 shows the spatial variability of the contribution of six risk factors for predicting LBC mortality rates in 3107 counties. A high positive contribution of smoking was observed in many counties in the Appalachians and the Mississippi Valley in the South and in the states of Missouri and Oklahoma (Fig. 7a). Poverty is identified as an important contributor in a large number of counties (Fig. 7b). The counties with high contributions from poverty on the LBC mortality rate are concentrated in the Appalachians and the Mississippi Valley in the South (Fig. 7b). Elevation, which is ranked the third most important risk factor overall, contributed negatively in many counties in the mountain area in the West, and Appalachian regions in the South, and the North East (Fig. 7c). In large numbers of counties in the Mid-West, North-East, and the Appalachian region in the South, percent white pollution showed a positive contribution to the predicted LBC mortality rate (Fig. 7d). A relatively Table 2. Mean absolute error (MAE), root mean squared error (RMSE) and the goodness of fit (R 2 ) during the training, validation, and testing stages. GLM generalized linear model, RF random forest, GBM graditent boosting machine, XGBoost eXtreme Gradient Boosting (XGBoost), DNN deep neural networks, GW-OLS geographically weighted OLS regression. www.nature.com/scientificreports/ higher Hispanic population negatively contributed to LBC mortality rate prediction in several counties in Texas, California, and New Mexico (Fig. 7e). Counties with a relatively low but positive contribution from PM 2.5 are mostly located in the "Rust Belt" region in the Northeastern and Midwestern of the US and Appalachian areas in the South (Fig. 7f).

Discussion
We demonstrated the potential use of stack-ensemble ML models and XAI to quantify and visualize the spatial variability of several risk factors' contributions to the LBC mortality rate across the conterminous USA. Geographically weighted (GW) models have widely been used to explore this relationship between risk factors and the LBC mortality rate 7,8,[17][18][19] . However, GW models have limitations in exploring the spatial relationship since local regression coefficients are derived in locations (e.g., counties) based on the proximate area of interest and number of neighbors 77 . To overcome this limitation, XAI with local model-agnostic interpretability and breakdown plots 37 shows promise to explore risk factors' contribution to spatial variability LBC mortality rates. In general, interpretable MI falls into two broad categories: personalized or prediction-level interpretation and dataset-or population-level interpretation, known as local and global interpretations, respectively 28 . The permutation-based feature importance, a global level-interpretation, identified smoking prevalence as the most important risk factor for LBC mortality. However, break-down plots of local model-agnostic showed a spatial variation in smoking's contributions to LBC mortality rate across the conterminous USA. In general, counties in the southern states, particularly in the Appalachian region and Mississippi Valley, have high smoking prevalence and LBC mortality rates 3,78-80 . The probability of smoking was strongly associated with compositional covariates: poverty, education, occupation, age, sex, race/ethnicity, nativity, employment, marital status, and household size 81 . Although cigarette smoking prevalence declined from 20.9% in 2005 to 14.0% in 2017, smoking is still a major cause of disease and death in the USA, accounting for more than 480,000 deaths every year, or about 1 in 5 deaths 4 . The high LBC mortality rates in the Appalachian region and Mississippi Valley can also be partly explained by high poverty rates, limited healthcare access, low educational attainment, and coal mining 82,83 . We identified county-level poverty rate as the second most important risk factor for LBC mortality across the contiguous US. Multivariate PD profile plots reveal a positive interaction between smoking and poverty rates since increasing both features leads to increased LBC mortality rates. The relationship between socioeconomic status and LBC mortality rates in the US is well established 13,82,84,85 . Access to health care is an economic issue, particularly in the US 7 . The socioeconomic status, such as poverty, determines early diagnosis and treatment and reduces the risk of death from LBC 86 . Percent population access to health insurance which is linked with poverty contributed strongly in predicting a high LBC mortality rate in Union County, Florida, which has the highest national LBC mortality rate.
Lung cancer incidence and mortality across the US were associated with the demographic composition 87 . In this study, we found that the percentage of the white or Hispanic population contributed positively and negatively, respectively, to the LBC mortality rate. Counties with a high proportion of white people in the Mid-West, North-East, and the Appalachian region in the South had higher LBC mortality rates than counties in the West www.nature.com/scientificreports/ with a relatively high proportion of Hispanics. Hispanics in the US have about a 50% lower incidence rate for lung cancer than the non-Hispanic white population 88 . Their presence contributed to lower LBC mortality rates in the western US generally 7 . The lower LBC incidence and mortality rates in this region are probably due to lower smoking rates in the Hispanic population 88 . We found a negative association between county-level smoking prevalence and the Hispanic population (r = − 0.315, p < 0.001). After smoking and poverty, median elevation ranked third in predicting LBC mortality nationally. In many mountainous counties in the West and North-East, elevation showed a negative contribution in prediction, which is consistent with the conceptual model of the impact of elevation on LBC mortality rates 89 and the study of Kerry et al. 7 . Low atmospheric oxygen in higher elevation areas acts as an inhibitor of free radical damage and tumorigenesis, which may be responsible for low incidence respiratory cancers across the US's mountainous counties 89 .
The overall association between the LBC mortality rate and PM 2.5 and SO 2 was positive among the four air pollutants. The shared geographic area of high LBC mortality rate, smoking, poverty, and air pollution (PM 2.5 and SO 2 ) in the southeast and the Appalachian region indicate the association of these risk factors with higher LBC mortality rates. Other factors, such as poor diet, genetic susceptibility, and occupational exposures, may act independently or in concert with smoking or air pollution in determining LBC incidence and mortality 90 . Inferior air quality in these regions may synergistically contribute to a higher risk of lung cancer or respiratory illness 91,92 .
This study has some limitations. The county-level data inherent limitations since data are model-based estimates from the BRFSS telephone survey 93 . Furthermore, the LBC rate data used in this study contain errors due to the under-recording of lung cancer deaths, errors in population count, and covariates used in modeling. Besides the limitation of the data, the "post-doc explainable ML" model has some limitations 29 . The XAI is usually not suggested for high-stack discussion making due to its unreliable and unrealistic explanation of what the original model computes. However, it is recently being used in health sectors 36,94,95 . Very recently stack-ensemble model with model agnostic methods has been applied to identify factors influencing childhood blood lead levels 72 .

Conclusions
To our knowledge, this study is the first one to apply XAI as "model greedy agnostic explanations of model predictions" or "break-down plot" 37 in a stack-ensemble framework to explore the spatial variability of the contribution of several risk factors to LBC mortality rates. Application of XAI for understanding the spatial variability of the associations between LBC mortality rates and the risk factors may allow advanced research and policy development to understand underlying, spatially varying contributors to LBC mortality across US counties. This study shows strong potential for implementing XAI as a complement to or substitute for the traditional spatial regression models. This study's findings may lead to more tailored and effective prevention strategies from a policy perspective, which is critical, given the projected prevalence growth of LBC mortality rates in the coming decades.

Data availability
The data sets generated during this study are available from the corresponding author upon reasonable request.